Robust Accurate Statistical Annotation of General Text
نویسندگان
چکیده
We describe a robust accurate domain-independent approach to statistical parsing incorporated into the new release of the ANLT toolkit, and publicly available as a research tool. The system has been used to parse many well known corpora in order to produce data for lexical acquisition efforts; it has also been used as a component in an open-domain question answering project. The performance of the system is competitive with that of statistical parsers using highly lexicalised parse selection models. However, we plan to extend the system to improve parse coverage, depth and accuracy.
منابع مشابه
Annotation in Architecture: A Systematic Approach toward Mobilization and Development of Theoretical, Research, and Critical Basis in Architecture
Annotations usually refer to marginal notes that explain a difficult or ambiguous subject, provide a general definition or a critical remark for a particular part of a text. Historically, annotating was a well-known tradition in Islamic sciences and was used especially in times when there were less new potentials for generating new knowledge. The main question of this research is, can the tradi...
متن کاملThe Second Release of the RASP System
We describe the new release of the RASP (robust accurate statistical parsing) system, designed for syntactic annotation of free text. The new version includes a revised and more semantically-motivated output representation, an enhanced grammar and part-of-speech tagger lexicon, and a more flexible and semi-supervised training method for the structural parse ranking model. We evaluate the releas...
متن کاملSemi-Automatic Annotation Tool to Build Large Dependency Tree-Tagged Corpus
Corpora annotated with lots of linguistic information are required to develop robust and statistical natural language processing systems. Building such corpora, however, is an expensive, labor-intensive, and time-consuming work. To help the work, we design and implement an annotation tool for establishing a Korean dependency tree-tagged corpus. Compared with other annotation tools, our tool is ...
متن کاملResource Report: Building Parallel Text Corpora for Multi-Domain Translation System
Parallel text is one of the most valuable resources for development of statistical machine translation systems and other NLP applications. However, manual translations are very costly, and the number of known parallel text is limited. Hence, our research started with creating and collecting a large amount of parallel text resources for Indonesian-English. We describe in this paper the creation ...
متن کاملInducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora
This paper describes a system and set of algorithms for automatically inducing stand-alone monolingual part-of-speech taggers, base noun-phrase bracketers, named-entity taggers and morphological analyzers for an arbitrary foreign language. Case studies include French, Chinese, Czech and Spanish. Existing text analysis tools for English are applied to bilingual text corpora and their output proj...
متن کامل